Skip to content

Conversation

@yannicks1
Copy link
Collaborator

[v1] remove v0 code

Now as we have v1 support for embedding models (#277 ), we can finally delete the v0 code.
Note: for decoder models v0 support was depreciated some time ago.

Signed-off-by: Yannick Schnider <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

if is_decoder and not envs.VLLM_USE_V1:
raise ValueError("Decoder models are only supported on v1")
if not envs.VLLM_USE_V1:
raise ValueError("vllm-spyre is only supported with vLLM v1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("vllm-spyre is only supported with vLLM v1")
raise ValueError("vllm-spyre is only supported with vLLM v1. Please set VLLM_USE_V1=1")


monkeypatch.setenv("VLLM_SPYRE_DYNAMO_BACKEND", backend)
monkeypatch.setenv("VLLM_USE_V1", "1" if vllm_version == "V1" else "0")
monkeypatch.setenv("VLLM_USE_V1", "1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can replace all of these with one os.environ["VLLM_USE_V1"] = 1 in conftest.py?

Copy link
Collaborator

@maxdebayser maxdebayser Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this env var is not needed anymore right. Should we not add an assert in the llm engine to verify that it is an instance of the the V1 Engine class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have this check in the platform.py. Don't you think that is enough and we can safely remove all monkeypatch.setenv("VLLM_USE_V1", "1")?

Copy link
Collaborator

@maxdebayser maxdebayser Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point envs.VLLM_USE_V1 will be removed. Also, this flag doesn't make sure that the current vLLM instance is not running as V0 as vLLM currently can fallback to V0 if VLLM_USE_V1 is unset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check this flag with os.getenv instead? In this way the plugin won't crash when envs.VLLM_USE_V1 is removed upstream

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have VLLM_USE_V1 in other places in the code. These will have to be removed anyway when the envs.VLLM_USE_V1 is removed upstream...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But aren't they removed on this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that the only remaining occurrences were for a hack testing both v0 and v1 engines. As we don't use that anymore, I removed it and incorporated you check via os.getenv. see this commit.

@maxdebayser
Copy link
Collaborator

Nice!

Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1
Copy link
Collaborator Author

I did a very radical change and removed all lines where we set VLLM_USE_V1=1. We don't support any v0 code anymore in vllm-spyre. vllm enables v1 by default here, we do a check in platform.py here and tell the user to set it if it is unset for any reason. IMO this is safe enough. What do you think @joerunde @maxdebayser ?

@yannicks1 yannicks1 marked this pull request as ready for review July 29, 2025 20:12
@yannicks1 yannicks1 self-assigned this Jul 29, 2025
@joerunde
Copy link
Collaborator

Yeah I think this is good enough then, this prevents anybody from running with VLLM_USE_V1=0

@joerunde
Copy link
Collaborator

@maxdebayser do we need to do a performance comparison of embeddings on v0 vs v1 first before deleting? Or are we good to go?

Signed-off-by: Yannick Schnider <[email protected]>
@maxdebayser
Copy link
Collaborator

@joerunde , I think we're good to go, I can run the V0 tests on a frozen version.

@waleedqk
Copy link
Collaborator

bot:test
MARKERS="spyre"

2 similar comments
@waleedqk
Copy link
Collaborator

bot:test
MARKERS="spyre"

@waleedqk
Copy link
Collaborator

bot:test
MARKERS="spyre"

@waleedqk
Copy link
Collaborator

bot:test
MARKERS="spyre and not quantized and not multi and not cb"

1 similar comment
@waleedqk
Copy link
Collaborator

bot:test
MARKERS="spyre and not quantized and not multi and not cb"

| Speculative Decoding | 🗓️ | |
| Guided Decoding | 🗓️ | |
| Pooling | ⚠️ | Works with V0. V1 still being developed in vLLM [vllm#18052](https://github.com/vllm-project/vllm/issues/18052) |
| Pooling | | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have Embedding models at the end of this table - is that still needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question for @maxdebayser

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't support all pooling applications, I think it's better to remove this and leave just Embedding below.

Signed-off-by: Yannick Schnider <[email protected]>
Copy link
Collaborator

@joerunde joerunde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

negative diff let's goooo

Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1 yannicks1 enabled auto-merge (squash) July 31, 2025 14:21
@github-actions github-actions bot added the ready label Jul 31, 2025
@yannicks1 yannicks1 merged commit 8e7d565 into main Jul 31, 2025
17 of 19 checks passed
@yannicks1 yannicks1 deleted the ysc-prune-v0 branch July 31, 2025 15:28
yannicks1 added a commit that referenced this pull request Aug 4, 2025
### [docs] remove pooling models from supported features

Following up a (late) discussion in #344 about removing the pooling
models from the list of supported models as not all all pooling
applications are supported, and we already have embedding models in that
list (see comment by @maxdebayser :
[link](https://github.com/vllm-project/vllm-spyre/pull/344/files#r2247822334))

---------

Signed-off-by: Yannick Schnider <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants